{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "authorship_tag": "ABX9TyP5ncXXfWeUVcVSos1Kg4fy",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/sugarforever/LangChain-Advanced/blob/main/05_SelfQuerying_Retriever/05_Self_Querying_Retriever.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# 05 Self-Querying Retriever\n",
        "\n",
        "A self-querying retriever, as the name implies, possesses the capability to initiate queries to itself.\n",
        "\n",
        "When presented with a natural language query, this retriever employs a query-constructing LLM chain to create a structured query. It then utilizes this structured query to interact with its VectorStore, enabling it to not just assess semantic similarity between the user-input query and stored documents but also to discern and execute filters based on user queries related to document metadata.\n",
        "\n"
      ],
      "metadata": {
        "id": "cUbNOf6aAD2w"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Key Takeaway\n",
        "\n",
        "1. Self querying is done on the metadata of the documents\n",
        "2. Query-constructing LLM chain is used to generate the query parameters and translate it to underlying vector store specific query (structured)."
      ],
      "metadata": {
        "id": "mG_zHbCgiFMD"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Example"
      ],
      "metadata": {
        "id": "ZCMb8QtOBawT"
      }
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "id": "xPWZLTgY5E_4",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "7e3fb26c-7eaa-4f9a-af72-1bc5da7f6d2b"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.9/1.9 MB\u001b[0m \u001b[31m19.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m77.0/77.0 kB\u001b[0m \u001b[31m9.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m448.1/448.1 kB\u001b[0m \u001b[31m40.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.0/2.0 MB\u001b[0m \u001b[31m67.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m108.9/108.9 kB\u001b[0m \u001b[31m12.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m40.1/40.1 kB\u001b[0m \u001b[31m4.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.4/2.4 MB\u001b[0m \u001b[31m86.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m66.3/66.3 kB\u001b[0m \u001b[31m8.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m59.5/59.5 kB\u001b[0m \u001b[31m5.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m5.4/5.4 MB\u001b[0m \u001b[31m99.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m6.2/6.2 MB\u001b[0m \u001b[31m101.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.8/3.8 MB\u001b[0m \u001b[31m104.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m67.3/67.3 kB\u001b[0m \u001b[31m9.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
            "  Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
            "  Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m593.7/593.7 kB\u001b[0m \u001b[31m49.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.4/49.4 kB\u001b[0m \u001b[31m5.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m67.0/67.0 kB\u001b[0m \u001b[31m7.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m46.0/46.0 kB\u001b[0m \u001b[31m4.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m295.0/295.0 kB\u001b[0m \u001b[31m32.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m7.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m341.4/341.4 kB\u001b[0m \u001b[31m33.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m4.2/4.2 MB\u001b[0m \u001b[31m107.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m79.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m129.9/129.9 kB\u001b[0m \u001b[31m14.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m86.8/86.8 kB\u001b[0m \u001b[31m10.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Building wheel for pypika (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n"
          ]
        }
      ],
      "source": [
        "!pip install -q -U langchain openai chromadb tiktoken lark"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "import os\n",
        "os.environ['OPENAI_API_KEY'] = \"your valid openai api key\""
      ],
      "metadata": {
        "id": "eVQjKJn-5ZZZ"
      },
      "execution_count": 2,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "from typing import Collection\n",
        "from langchain.llms import OpenAI\n",
        "from langchain.schema import Document\n",
        "from langchain.embeddings.openai import OpenAIEmbeddings\n",
        "from langchain.vectorstores import Chroma\n",
        "from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
        "from langchain.chains.query_constructor.base import AttributeInfo\n",
        "\n",
        "docs = [\n",
        "    Document(page_content=\"A bunch of scientists bring back dinosaurs and mayhem breaks loose\", metadata={\"year\": 1993, \"rating\": 7.7, \"genre\": \"action\"}),\n",
        "    Document(page_content=\"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\", metadata={\"year\": 2010, \"director\": \"Christopher Nolan\", \"rating\": 8.2}),\n",
        "    Document(page_content=\"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\", metadata={\"year\": 2006, \"director\": \"Satoshi Kon\", \"rating\": 8.6}),\n",
        "    Document(page_content=\"A bunch of normal-sized women are supremely wholesome and some men pine after them\", metadata={\"year\": 2019, \"director\": \"Greta Gerwig\", \"rating\": 8.3}),\n",
        "    Document(page_content=\"Toys come alive and have a blast doing so\", metadata={\"year\": 1995, \"genre\": \"animated\"}),\n",
        "    Document(page_content=\"Three men walk into the Zone, three men walk out of the Zone\", metadata={\"year\": 1979, \"rating\": 9.9, \"director\": \"Andrei Tarkovsky\", \"genre\": \"thriller\", \"rating\": 9.9})\n",
        "]\n",
        "vectorstore = Chroma.from_documents(\n",
        "    docs, OpenAIEmbeddings(), collection_name=\"self_querying\"\n",
        ")\n",
        "\n",
        "metadata_field_info=[\n",
        "    AttributeInfo(\n",
        "        name=\"genre\",\n",
        "        description=\"The genre of the movie\",\n",
        "        type=\"string\",\n",
        "    ),\n",
        "    AttributeInfo(\n",
        "        name=\"year\",\n",
        "        description=\"The year the movie was released\",\n",
        "        type=\"integer\",\n",
        "    ),\n",
        "    AttributeInfo(\n",
        "        name=\"director\",\n",
        "        description=\"The name of the movie director\",\n",
        "        type=\"string\",\n",
        "    ),\n",
        "    AttributeInfo(\n",
        "        name=\"rating\",\n",
        "        description=\"A 1-10 rating for the movie\",\n",
        "        type=\"float\"\n",
        "    ),\n",
        "]"
      ],
      "metadata": {
        "id": "7qjgj56XSn5E"
      },
      "execution_count": 3,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "document_content_description = \"Brief summary of a movie\"\n",
        "llm = OpenAI(temperature=0)\n",
        "retriever = SelfQueryRetriever.from_llm(llm, vectorstore, document_content_description, metadata_field_info, verbose=True)"
      ],
      "metadata": {
        "id": "S5NvjPdlhYDf"
      },
      "execution_count": 28,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Ask with a regular query"
      ],
      "metadata": {
        "id": "GDnavYa8hbY6"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "retriever.get_relevant_documents(\"What are some movies about dinosaurs\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "JcR0ioK_hauq",
        "outputId": "0e3b32bc-a82d-4471-9567-33aefce92a12"
      },
      "execution_count": 29,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "[Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'genre': 'action', 'rating': 7.7, 'year': 1993}),\n",
              " Document(page_content='Toys come alive and have a blast doing so', metadata={'genre': 'animated', 'year': 1995}),\n",
              " Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'director': 'Andrei Tarkovsky', 'genre': 'thriller', 'rating': 9.9, 'year': 1979}),\n",
              " Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'director': 'Satoshi Kon', 'rating': 8.6, 'year': 2006})]"
            ]
          },
          "metadata": {},
          "execution_count": 29
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Ask with a filter"
      ],
      "metadata": {
        "id": "TcceqQVIhkSM"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "retriever.get_relevant_documents(\"I want to watch a movie rated lower than 8\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Hb2lCXmfhnYh",
        "outputId": "9bc14370-6629-48bf-df70-4c9982bc81af"
      },
      "execution_count": 30,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "[Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'genre': 'action', 'rating': 7.7, 'year': 1993})]"
            ]
          },
          "metadata": {},
          "execution_count": 30
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Ask with a query containing a filter"
      ],
      "metadata": {
        "id": "zpoXP2C_htVy"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "retriever.get_relevant_documents(\"Has Greta Gerwig directed any movies about women\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "uDh-KYGXhxIq",
        "outputId": "1b8dd19f-2cd5-4307-f6b6-8b958274d91b"
      },
      "execution_count": 31,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "[Document(page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them', metadata={'director': 'Greta Gerwig', 'rating': 8.3, 'year': 2019})]"
            ]
          },
          "metadata": {},
          "execution_count": 31
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Ask with a composite filter"
      ],
      "metadata": {
        "id": "XeKaxVubh2yb"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "retriever.get_relevant_documents(\"What's a highly rated (above 8.5) triller film?\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "JGNgDZ8vh5YZ",
        "outputId": "e9ff4793-4f93-4dcb-a2f5-212fa1908229"
      },
      "execution_count": 9,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "[Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'director': 'Andrei Tarkovsky', 'genre': 'thriller', 'rating': 9.9, 'year': 1979})]"
            ]
          },
          "metadata": {},
          "execution_count": 9
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Ask with a query and composite filter"
      ],
      "metadata": {
        "id": "vSahhaO-h8q4"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "retriever.get_relevant_documents(\"What's an animated movie that's all about toys and after 1990\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "cMlp7yHBh_g7",
        "outputId": "b4efa47c-d3aa-4022-fb0d-eedae012da1a"
      },
      "execution_count": 32,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "[Document(page_content='Toys come alive and have a blast doing so', metadata={'genre': 'animated', 'year': 1995})]"
            ]
          },
          "metadata": {},
          "execution_count": 32
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "retriever.query_constructor.invoke({\"query\": \"What's an animated movie that's all about toys and after 1990\"})"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "--RrtZcC-i9l",
        "outputId": "0a73ffc6-10dc-41f1-a489-809ae9cf823f"
      },
      "execution_count": 33,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "StructuredQuery(query='toys', filter=Operation(operator=<Operator.AND: 'and'>, arguments=[Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='genre', value='animated'), Comparison(comparator=<Comparator.GTE: 'gte'>, attribute='year', value=1990)]), limit=None)"
            ]
          },
          "metadata": {},
          "execution_count": 33
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "retriever.query_constructor.invoke({\"query\": \"Show me one movie that's rated higher than 8\"})"
      ],
      "metadata": {
        "id": "jXHAFlk5Dl9t"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Enable the limit\n",
        "\n",
        "The parameter `enable_limit` in `SelfQueryRetriever.from_llm` can be used to enable limit, which can allow developers to specify how many records to retrieve"
      ],
      "metadata": {
        "id": "jwZBE-zWDJ6a"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "retriever = SelfQueryRetriever.from_llm(\n",
        "    llm,\n",
        "    vectorstore,\n",
        "    document_content_description,\n",
        "    metadata_field_info,\n",
        "    verbose=True,\n",
        "    enable_limit=True)"
      ],
      "metadata": {
        "id": "ySeejTuyDeIq"
      },
      "execution_count": 34,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "retriever.query_constructor.invoke({\"query\": \"Show me one movie that's rated higher than 8\"})"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "CF-_7XEfDmzV",
        "outputId": "b137af7e-a06c-4af8-9259-6685ae1fdea5"
      },
      "execution_count": 35,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "StructuredQuery(query=' ', filter=Comparison(comparator=<Comparator.GT: 'gt'>, attribute='rating', value=8), limit=1)"
            ]
          },
          "metadata": {},
          "execution_count": 35
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "retriever.get_relevant_documents(\"Show me one movie that's rated higher than 8\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "95VI28DVMZat",
        "outputId": "2f12d2b6-ea0c-4fa9-cbed-59a43004cfab"
      },
      "execution_count": 36,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "[Document(page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them', metadata={'director': 'Greta Gerwig', 'rating': 8.3, 'year': 2019})]"
            ]
          },
          "metadata": {},
          "execution_count": 36
        }
      ]
    }
  ]
}